TransNeural: An Enhanced-Transformer-Based Performance Pre-Validation Model for Split Learning Tasks

While digital twin networks (DTNs) can potentially estimate network strategy performance in pre-validation environments, they are still in their infancy for split learning (SL) tasks, facing challenges like unknown non-i.i.d. data distributions, inaccurate channel states, and misreported resource availability across devices. To address these challenges, this paper proposes a TransNeural algorithm for DTN pre-validation environment to estimate SL latency and convergence. First, the TransNeural algorithm integrates transformers to efficiently model data similarities between different devices, considering different data distributions and device participate sequence greatly influence SL training convergence. Second, it leverages neural network to automatically establish the complex relationships between SL latency and convergence with data distributions, wireless and computing resources, dataset sizes, and training iterations. Deviations in user reports are also accounted for in the estimation process. Simulations show that the TransNeural algorithm improves latency estimation accuracy by 9.3% and convergence estimation accuracy by 22.4% compared to traditional equation-based methods.


Introduction
In the era of the sixth generation mobile network (6G) [1,2], an increasing number of artificial intelligence (AI) applications will rely on edge network to train their continually expanding models based on massive amounts of user data [3].While users may be willing to provide their data with payment, they are concerned about data privacy issues.Meanwhile, fully training AI models on user devices to upload only model parameters can address privacy issues, but is often impractical due to constraint user resources [4].Additionally, companies are reluctant to fully disclose their model parameters, since these are valuable assets [5].In this context, split learning (SL) has emerged as a promising solution [6].In "SL for model training process", an edge server (ES) connected with an access point (AP) offloads only a few lower layers of AI models to a user device for local training, where the device interacts intermediate results in forward and backward propagations with ES to update the upper layers.Once a user device finishes updating using its own data, it sends the updated lower layers to the next user device via AP, continuing this process until all participants have contributed their data.This model training method protects user privacy and conserves device resources, and we abbreviate it as "SL tasks".The applications include illness model training in healthcare [7] and user behavior analysis in mobile networks [8], etc.With the adoption of SL technology, a crucial concern for AI application companies is to pre-estimate SL latency and convergence performance under given resource allocation strategy before paying users and operators for data, wireless, and computing resources [6].However, under unknown non-i.i.d.data distribution [9], inaccurate wireless channel state information (CSI), and misreported available computing resources on different user devices, it becomes extremely difficult to accurately estimate the SL performance before real physical executions.The reason is as follows.Suppose there are three participate user devices: A, B, and C. If the data distribution on A is similar to that on C, but significantly different from B, then the training sequences A-C-B and A-B-C will result in considerable differences in SL training convergence, despite utilizing the same amount of resources.The situation becomes even more problematic when there exists reporting deviations in CSI, amount of data and computing resources on devices.Consequently, AI application companies face high risks when investing heavily in network operators and user devices for SL training on edge, without confidence in achieving their training latency and convergence goals, which significantly dampens their enthusiasm.
The Digital Twin Network (DTN), emerging as a critical 6G technology [10], is an intelligent digital replica of a physical network that synchronizes physical states in real-time and uses AI models to analyze the characteristics and relationships of network components for accurate inferences in 6G network [11].In the context of SL, DTN is expected to automatically model data similarities between devices, account for user misreporting and correct inaccurate CSI based on the unique profiles of different devices.Moreover, it can learn the complex relationships between service performance and network parameters.These functions can be realized based on AI models and form a pre-validation environment in DTN for testing SL strategy performance.Based on the pre-validation results, DTN can repeatedly optimize the SL strategy until it meets latency and convergence requirements before deployment in the physical world.
Nevertheless, several challenges exist in establishing DTN pre-validation environment for SL training tasks.First, learning data similarities between different physical devices is difficult due to data privacy issues, which prevent the acquisition and analysis of user data.Second, accurately estimating the misreported behaviors of each physical device is challenging.These deviations depend on multiple factors such as device version, user behavior, movement patterns, etc.However, the actual physical information is unattainable due to user privacy, data collection errors, and other factors, making it impossible to learn errors from historical data.Third, it is difficult to model the complex relationships between various SL training performance metrics and network factors like data similarities, dataset size, training iterations, device resources, wireless channel quality, and edge server resources, particularly when unknown parameters exist.Fourth, the model needs to be broadly applicable to various SL requirements (such as latency, convergence, and energy efficiency) and different tasks.This would enable an autonomous modeling process for prevalidation environments, advancing toward the ultimate goal of realizing fully automatic network management in DTNs [12].
Existing AI models for the DTN pre-validation environment can be divided into three categories.First, long short-term memory (LSTM) networks are used to establish DTN models for predicting future states.For example, the authors of [13] used LSTM to build a network traffic prediction model for DTN based on historical traffic data from a real physical network, which can be utilized for network optimization.Similarly, the authors of [14] employed LSTM to model and predict changes in reference signal receiving power, aiming to reduce handover failures in DTN.Second, graph neural network (GNNs) are used to model network topology for strategy pre-validation.For instance, the authors of [15] designed a GNN-based pre-validation environment for DTN to estimate delay and jitter under given queuing and routing strategies.Likewise, the authors of [16] proposed a GNNbased pre-validation model to estimate end-to-end latencies for different network slicing schemes.Third, a combination of GNN and transformer models is used to capture both temporal and spatial relationships among network components.For example, the authors of [17] developed a DTN pre-validation environment that predicts end-to-end quality of service for resource allocation strategies in cloud-native micro-service architectures.
Although existing methods are important explorations for building an intelligent DTN pre-validation environment, several problems persist in the context of SL training tasks.First, LSTM and GNN-based models are only adept at learning the relationships between adjacent nodes, where the learning performance drops quickly as the device separation grows.This property makes them suitable for scenarios where only the relationships between adjacent nodes are critical, such as time-series predictions [18].However, quickly learning the data relationships between all devices is crucial for accurate SL performance estimation, since every device has the potential to participate with a random sequence.Second, while transformer-based models excel in learning inter-node relationships, they may struggle with understanding the relationships within the feature vector of a single node.This capability is vital for accurately modeling and predicting the performance of SL training tasks in wireless network.
Based on these considerations, this paper presents a TransNeural algorithm for DTN pre-validating environment towards SL training tasks.This algorithm integrates transformers with neural network, leveraging the transformer's strength in inter-node learning [19] to capture data similarities among different user devices and the neural network's strength in inter-feature learning [20] to understand the complex relationships between various system variables, SL training performances, and deviation characteristics between different nodes.The contributions of this work can be summarized as follows: The remainder of our work is organized as follows.Section 2 gives the system model.Section 3 designs the TransNeural algorithm in DTN for estimating the latency and convergence of SL training tasks.Simulation results and discussions are given in Section 4. Finally, we conclude this paper in Section 5.

System Description
In this section, we sequentially present the system model, communication model, SL process and DTN.

System Model
Figure 1 gives the whole system of DTN-integrated network for SL training tasks.In our system, there are N user devices access to an AP equipped with an edge server (ES).Each device is characterized by some unique features, including data distributions, the amount of available computing resources, misreporting behaviors, etc.On the n-th device, the dataset is denoted as Q n with total size Q sum n = ||Q n ||; the amount of available computing resource is denoted as f n .The data similarities between different devices depend on their users habits (such as shopping habits) and behaviors.The distance between the n-th device with AP is denoted as d.The index of key notations are listed in Table 1.There is a DTN deployed on the ES, denoted as D. It synchronizes network states via AP, and generates network strategies (such as resource allocation strategies) according to requirements of oncoming tasks (such as SL training tasks).Specifically, it has a prevalidation environment for testing and optimizing strategies before they are executed in the real physical network, forming a closed-loop strategy optimization in DTN.The contribution of our work is concentrated on the establishment of the pre-validation environment for SL training tasks.
In SL training process, a target AI model is partitioned into "upper layers" and "lower layers".Under a given resource allocation strategy, the AP would sequentially offload the lower layers to the participate devices, which are responsible for computing forward and backward propagation for lower layers.The updated lower layers would be transmitted to the next participate device via AP until all participants have contributed and the final updated lower layers will be sent back to ES to merge with upper layer.This process can train the target AI model while protecting user data privacy and conserving device resources.Expected transmission rate in downlink and uplink; Real transmission rate in downlink and uplink

Communication Model
The AP is equipped with M antenna, and each user device is equipped with one antenna.The transmission rate of each user device is where B n is the allocated bandwidth for the n-th device, i = {up, dn} represent uplink and downlink, P n,i,t is the instant uplink or downlink transmitting powers between the n-th device and AP in the t-th time slot, N 0 is the power spectral density of noise, is the instant channel gain, where L n is large-scale fading and h n,t is small-scale fading.
The channel gain H n,t is a circularly symmetric complex Gaussian with zero mean and variance σ 2 n [21], which randomly changes across different time slots.The probability density functions (PDF) of H n,t in uplink and downlink are given in Theorem 1.
Theorem 1 (PDF for SIMO and MISO channels).The PDF of channel gain for SIMO and MISO can be derived as u , according to [22,23].
To maximize transmission rate and cancel the influence from small-scale channel gain changing, we adopt a small-scale power adaptation scheme from [24] on each time slot.That is, the uplink allocated transmitting power P n of device n and the downlink transmitting power P AP of AP are taken as the uplink and downlink average power between different time slots.The scheme adapts the instantaneous power according to the channel gain in the t-th time slot by where H L is the lower bound of channel gain, P n,i,tar is target receiving power.Take the uplink transmission process of the n-th device as an example, the relationship between the target receiving power with the average transmitting power P n is where Γ u (•) and Γ D (•) are the upper and lower incomplete Gamma functions, respectively.Therefore, the target receiving power at AP and the n-th device can be derived as By introducing the small-scale power adaptation scheme, the maximized transmission rate is converted from Equation (1) to

Split Learning Process
The SL process trains its target AI model based on a given resource allocation scheme, which chooses a set of user devices to participate the SL training process and decide their working sequence.We use a variable s k (k = 1, 2, • • • , K) to contain the device number, which is assigned as the k-th working device participating in the SL training process.Then, the target AI model is split into lower layers M APP lw and upper layers M APP i .The bit-size of M APP lw is b lw .Then, the lower layer M APP lw is offloaded to the device whose number is s 1 .Then, the s 1 -th device extract a mini-batch from its local dataset to perform forward propagation with computing workload (Q s k • c forward lw ), where Q s k is the size of a minibatch, which need to be smaller than the total amount of data (Q s k < Q sum s k ) on the s k -th device.Then, it passes the middle parameters of forward propagation to AP with bit-size (Q s k • b mid,fw ), where b mid,fw is the bit-size of the middle results of one training sample.Based on the received middle parameters, the ES continues to perform forward propagation for upper layers with computing workload (Q s k • c up,fw ).After that, it computes backward propagation for upper layers with computing workload (Q s k • c up,bk ).Then, it sends the middle parameter of backward propagation back to user device via AP, with bitsize (Q s k • b mid,bk ).Next, user device computes backward propagation with computing workload (Q s k • c lw,bk ).This process iterates I k times.After that, the s k user device would send the updated lower layers to AP, which sends them to the next participate user s k+1 to continue SL training process, until all participants has trained the target AI model.

Digital Twin Network
The DTN can be model as where O, G, E, Z are network management module, network strategy module, pre-validation environment and network database, respectively.Among them, the network management module O is responsible for analyzing the requirements of different tasks (such as SL tasks), and orchestrating different modules to reach the requirements.Network strategy module G can generate resource allocation solutions based on intelligent algorithms (such as deep reinforcement learning algorithms, generative AI algorithms, etc.), whose performance can be tested in the pre-validation environment E, and based on the testing results, the module G can improve its strategy.The design of network strategy module is beyond the scope of this work.Specifically, this paper focus on the establishment of pre-validation environment E, which needs to estimate the performance of a resource allocation solution for SL training tasks, which is essential to guarantee the SL performance.Its input includes current network states and resource allocation solution; its output is the SL performance metrics such as latency and convergence.Notice that the estimating process is challenge because of unknown data distributions and misreported wireless channel qualities and available computing resources.Finally, network database Z can be expressed as where the superscript "*" represents that the recorded data in the DTN has some deviation from the actual data.L * n is the recorded large-scale fading of the n-th device in the DTN, corresponding to a deviated root square variant σ * n of channel gain.Notice that the DTN does not need to collect the small-scale fading, since we deploy a small-scale power adaptation scheme on user devices and AP to deal with it in Section 2.2.f * av,n is the recorded available amount of computing resources of the n-th device in the DTN, which may have error from the real amount of computing resources f av,n .The error becomes from user misreported behavior, data collecting error, and sudden appeared emergency tasks.Under a computing resource allocation solution f n generated by network strategy module G, the actual occupied computing resource on the n-th device is inaccurate compared with the allocated computing resource f n , which can be expressed as

Transformer-Based Pre-Validation Model for DTN
The aim of this paper is to establish a pre-validation environment for estimating latency and convergence of SL training tasks under given network states and strategies.To do this, this section first analyzes the relationships between the estimation objects, i.e., SL training latency and convergence, with network states and strategies from mathematical view, where several unknown parameters exist and makes the estimation process difficult.To cope with the difficulties, we propose a TransNeural algorithm which combines transformer and neural network to learn inter-nodes and inter-feature relationships with unknown parameters.

Problem Analysis for SL Convergence Estimation
The SL training process comprises iterative wireless transmission process and computing process on AP and different devices.Without causing ambiguity, we use divergence ε to replace convergence (1 − ε).First, the wireless outage can cause transmission failure in middle parameter forward and backward propagations.Based on the small-scale power adaptation scheme in Section 2.2 and PDF of channel gain in Theorem 1, the outage probability in different wireless transmission process can be expressed as The outage probability is inaccurate because the root square variant σ * n of channel gain is inaccurate.Because of outage probability in uplink and downlink, the usable amount of data in a mini-batch in one iteration changes from In addition, the original divergency rate in computing process can be expressed as where µ ⋎ is an unknown divergence parameter depending on the data distribution on different devices.The superscript "⋎" denotes that the value of a variable is unknown.Then, considering outage probability in wireless transmission process, the finally divergence on s k -th device is The overall training divergence throughout different devices needs to consider data similarity among different devices, which can be expressed with an unknown data correlation matrix where is the data similarity vector of device k.Then, the overall divergence can be expressed as

Problem Analysis for SL Latency Estimation
The latency of SL training process is the sum of transmitting and computing latencies on AP and different participate devices.In detail, first, the lower layers M APP lw with bit-size b lw is wirelessly transmitted to s k -th device, where the downlink transmission latency is where the transmission rate is inaccurate.The reason is that the root square variant σ * n of channel gain is inaccurate, which leads the calculated target power P n,dn,tar to be inaccurate according to Equation (4); thus, the transmission rate R * n,dn is inaccurate according to Equation (5).Then, the s k -th user device computes forward propagation with computing latency where f * s k is actual allocated computing resources on the s k -th device, which is inaccurate according to Equation (8).Then, the middle results of forward propagation is transmitted to AP with uplink transmission latency The AP performs forward propagation for the upper layers using the accepted middle data.Then, it computes the loss function and performs backward propagation for the upper layers.The total computing latency of this forward and backward propagation process is where outage probability in uplink transmission process leads an average of (ε s k ,tr • Q s k ) become unusable.After that, AP sends the backward middle results to user device with latency Then, user device performs local backward propagation with computing latency where outage probability in downlink transmission process leads another ε * s k ,tr rate of unstable data.The above process iterates I s k times.Finally, user device upload lower layers to AP with latency Therefore, the total latency with one device is The total latency of SL training process with all participate devices is To accurately estimate SL training latency and divergence, a pre-validation model needs to simultaneously model the inter-node relationships, the relationships between different features and estimate output, and the unknown or inaccurate parameters, which is challenging work.In detail, first, the data correlation between adjacent participating devices greatly influence the overall training divergence, according to Equation (13).Second, the relationships between network states (such as data similarities, wireless channel quality, bit-size of lower layers b lw , etc.) and network resource allocation variables (such as size of mini-batch Q s k , training iteration I s k , wireless transmitting powers P n , etc.) with the training divergence and latency need to be intelligently and automatically established to realize automatic network management in DTN, where the relationships are obviously complex according to Equations ( 13) and (22).Third, the unknown parameters (such as divergence parameter µ ⋎ s k ) and inaccurate parameters (such as root square variant σ * n of channel gain) need to learn without collecting their accurate values.
To do this, we propose a TransNeural algorithm, which combines transformer with neural network to automatically learn the acquired models.The whole architecture is shown in Figure 2. First, we use neural-based feature extracting layers to automatically classify which group of features needs to model their relationships between different devices, and which group of features needs to model their inner-relationship.Then, we design encoderbased line to learn the inter-node relationships and neural-based line to learn inter-feature relationships, respectively.Finally, the outputs of two lines are aggregated using neural network to further learn the complex relationship between two lines with the estimate outputs.In addition, thanks to the added neural network in different positions, the model can learn the unknown or inaccurate parameters automatically.The proposed TransNeural algorithm can be expressed as a function where all of the elements are mean normalized.The loss function is defined as where B is the size of training batch-size for the TransNeural algorithm.

Feature Extracting Layers
Feature extracting layers aim to automatically project different features into two groups, deciding which group of features are highly combined with inter-node relationships (such as device number), and which contribute to inter-feature learning for establishing complex relationships between network strategy and performances.Elements in different groups may overlap.We use full connected layers to construct such feature extracting layers, which can be expressed as a function by

Positional Encoding Layer
In the SL training task, the device participation sequence greatly influences the training performance, because the initial training divergence on a new device largely depends on its data similarity with the previous training device.Therefore, we need to take the device sequence into consideration to model the relationship between strategies and SL performance.Thus, a positional encoding layer is introduced before the encoder layers.Consistent with classical settings, the expression of positional encoding is based on sine and cosine functions PE (k,2i) = sin(k/10, 000 2i/d model ) where k is the participating sequence of a device, d model is the input dimension of encoder layer, and i denotes different input neural of the encoder layer.

Encoder Layers
The encoder layers use the classical components in encoders, including K, Q, V matrices to learn the inter-node relationships by We apply multi-head attentions to learn different kind of relationships between devices by where Finally, the encoder layer can expressed as Modeling data similarities C s k and divergency parameter µ s k . (29)

Inter-Feature Learning Layers
In the second line, we design inter-feature learning layers to non-linearly produce some key elements for estimating SL performance based on a part of the features.Considering that neural networks are inherently good at extracting the features of different levels in their layers to finally establish the complex relationships between input and output, we simply use fully connected neural network to construct the inter-feature learning layers.That is, in our scenario, the inter-feature learning layers can output wireless outage probability based on the input channel quality, and output SL latency based on the input transmitting power, computing resources, etc.At the same time, it can automatically modify the error in inaccurate computing resource and channel gain based on the device number.The function can be expressed as Modeling relationships and learn accurate H s k , f s k . (30)

Estimation Generating Layer
Two lines are finally combined in the fully connected layers, which would jointly consider the inter-node relationships and inter-feature relationships to produce the final estimated system performance, i.e., the SL training latency and accuracy in this scenario.

Simulation
In this section, we evaluate the proposed TransNeural algorithm in Python 3.7.6.We suppose that there are N (N ∈ [10,30]) devices access to an AP, where the path loss model is 38.46 + 20 log 10 (d) [25], d is the distance between device and AP.The communication frequency is 2.6 GHz.Each device has a unique identity number from 1 to N, which is uniquely projected to a virtual device with that number in DTN.In addition, each device has a series of unique characteristics, which are unknown and about to be automatically learned by DTN From the parameter settings, it can be seen that the range of value for every parameter is obviously large, with the aim to include as many as network conditions as possible.Notice that, in an SL training process, not all devices will be selected to join the training process.However, the DTN model needs to learn the characteristics of all device to better estimate the training performance for every given resource allocation strategies.Without causing ambiguity, we use SL divergence to replace SL convergence in the simulation section, to make analysis more clearly.Other parameters are given in Table 2. Details of model in TransNeural algorithm are setting as follows.The dimension of input array is 5, which is the feature dimension of one device.The structure of the feature extracting layer is 64 × 2, whose first layer and second layer take ReLu function and Linear function as the activation functions, respectively.The encoder block has four encoder layers, where each layer has eight heads for multi-head attention, and the dimension of input vector is 64.The structure of inter-feature learning layer is 64 × 2, whose layers take ReLu function as the activation functions.The structure of estimation generating layer is 64 × 2, whose first layer and second layer take ReLu function and Linear function as the activation functions, respectively.
The baselines are set as follows: • Equation-based algorithm: this algorithm estimates latency and divergency based on the equations in Sections 3.1 and 3.2.

•
LSTMNeural algorithm: We merge the long short-term memory (LSTM) and neural network to form a LSTMNeural algorithm.That is, in the architecture of proposed TransNeural algorithm, the PE layer and encoder layer are replaced by an LSTM.The input dimension of the LSTM is 16.The architecture of its hidden layer is 32 × 4. Figure 3 gives learning convergence of the LSTMNeural and proposed TransNeural algorithm.The figure indicates that the TransNeural algorithm converges faster than the LSTMNeural algorithm, and can acquire a lower loss.The reason lies in the fact that, on each data sample, LSTM can only learn the data correlations between adjacent participate devices, while transformer can learn the data correlations between all of the participate devices.Therefore, TransNeural algorithm has a much higher learning efficiency to estimate network performance.In addition, the figure indicates that the TransNeural algorithm converges fast under different size of SL participant group.Figures 4 and 5 give the total SL latency and estimation error with growing average distance between selected user devices and AP.In general, the wireless transmission rate will first drop as the distance grows.Considering the wireless transmission happens frequently in each forward and backward propagation process, the transmission delay will significantly influence the total SL training delay, which leads the total SL delay increases as the distance grows.From the figure, since the DTN may not acquire accurate CSI, latency estimation based on traditional equation-based method will have a high error, especially under larger distance where CSI error becomes sensitive for latency estimation.In comparison, the proposed TransNeural algorithm has a stable error with distance growing, which stays under 240 s per omit-table because the total error is 2100-2300 s.However, LSTMNeural has a large estimation error because of its low learning efficiency.Figures 6 and 7 give the total SL latency and estimation error with a growing allocated computing resources on devices.Because the user device needs to use their allocated computing resources to compute forward and backward propagation of lower layers in SL target models, the computing latency will decrease with the increasing allocated computing resources, which leads the total latency decrease.However, because of user misreport information, the real allocated computing resources may be smaller than the amount of allocated resource in network strategies, which can not be modeled in traditional equation-based latency estimation methods, leading to estimation errors.In comparison, the proposed TransNeural algorithm and LSTMNeural algorithm can automatically learn these misreport behaviors from historical data, thus better estimate the real latency than traditional methods.Moreover, the TransNeural outperforms LSTMNeural because of high learning efficiency.Figure 8 gives the natural logarithm of the training divergence with growing average size of mini-batch used for training on user devices.The divergence drops as the size of mini-batch grows.Since the equation-based algorithm cannot learn the data correlations between device and takes the average correlation value to estimate divergence, its estimation has a large error compared with two other algorithms.In comparison, the proposed TransNeural algorithm can accurately predict the divergency under various size of mini-batch, thanks to its high learning ability.In addition, the figure indicates that the LSTMNeural algorithm has a higher estimation error than the proposed TransNeural algorithm.This is because that the LSTM algorithm cannot leverage dataset to learn data correlations between different devices efficiently compared with the transformer algorithm, as discussed earlier.
Outage probability, computing divergence, final divergence on the s k -th device C ⋎ , I s k Data correlation matrix, training iteration b lw , b up , b mid,fw , b mid,bk Bit-size of lower and upper layers in target model, bit-size of middle parameter in forward and backward propagation

Figure 2 .
Figure 2. Proposed TransNeural algorithm by combining encoder with neural network.

Figure 4 .
Figure 4. SL training latency vs. distances between devices and AP.

Figure 5 .
Figure 5. Estimation error of SL training latency vs. distances between devices and AP.

Figure 6 .
Figure 6.SL training latency vs. computing resources on devices.

Figure 7 .
Figure 7. Estimation error of SL training latency vs. computing resources on devices.

Figure 8 .
Figure 8. SL training divergence rate vs. average size of mini-batch used for training on devices.

Figure 9
Figure9gives the natural logarithm of training divergence with growing average distances between selected devices and AP.As distance grows, outage probability in wireless transmission process increases.It greatly influences the SL training divergence considering the dense wireless transmission process exists in SL training, which is also proved in the figure.In traditional equation-based algorithm, inaccurate CSI leads to a high divergency estimation error compared with the other two algorithm, which could eliminate error because of integrated neural network.

Figure 9 .
Figure 9. SL training divergence rate vs. distances between devices and AP.

Figure 10
Figure 10 gives the natural logarithm of training divergence with growing average training iterations among participating devices.In general, the divergence decreases as the training iteration increasing.The equation-based algorithm has a higher estimating error compared with two other algorithms because of lacking accurate data correlation parameters.Moreover, the proposed TransNeural algorithm outperforms the LSTMNeural algorithm because of high learning efficiency on given dataset.

Table 1 .
Index of key notations.
n Dataset on the n-th device, size of dataset on the n-th device D, O, G, E, Z DTN notation, network management module, network strategy module, pre-validation environment and network database in DTN f av,n , f * av,n ; f n , f * n Reported and real available computing resources, expected and real computing resources allocating strategy R n,dn , R n,up ; R * n,dn , R * n,up models.Their values are randomly set as follows: distance d * k ∈ [10, 900] m, average reporting deviation on distance ∆d k ⋎ ∈ [1, 10] m, average deviation between allocated computing resource and actual provided computing resource ∆ f k ⋎ ∈ [0.1, 1] GHz, divergence related parameter µ ⋎ k ∈ [1, 10], each element in data similarity vector